Skip to content

Non-record: Three Approaches + Lessons Learned (best: 1.1188 BPB)#1001

Open
ibarrajo wants to merge 1 commit intoopenai:mainfrom
ibarrajo:submission/non-record-approaches
Open

Non-record: Three Approaches + Lessons Learned (best: 1.1188 BPB)#1001
ibarrajo wants to merge 1 commit intoopenai:mainfrom
ibarrajo:submission/non-record-approaches

Conversation

@ibarrajo
Copy link
Copy Markdown

Summary

Three approaches tested, all rule-compliant. Best legal result: 1.1188 BPB (s_0 TTT only).

Previous PR #991 was closed because TTT re-scored tokens after training. This submission reports only the legal s_0 score. All GPTQ calibration runs within 600s training budget.

Approach val_bpb Notes
A (#569 VRL+GPTQ, int5, no TTT) 1.1317 int5 penalty on d=512
B (#576 d=576 int5, no TTT) 1.1249 Strong base
B + legal s_0 TTT 1.1188 Score-first only, no re-eval
C (GEPA int5 + TTT) N/A Artifact 16.3MB over limit

Lessons learned

  1. TTT re-scoring is illegal — only cumulative s_0 from first pass counts
  2. int5 penalty on d=512: +0.014 BPB vs int6
  3. Legal s_0 TTT: -0.006 BPB improvement
  4. GPTQ must be within 600s training budget — our script reserves time and asserts

Rule compliance

  • GPTQ calibration within training budget (assert: train+gptq < 600s)
  • Artifact < 16MB (assert in code)
  • Eval < 600s (assert in code)
  • TTT reports s_0 only — NO re-scoring after training
  • No val tokens in artifact

Based on PRs #569, #576, #505. Submitted as non-record data points.

🤖 Generated with Claude Code

Approach A (openai#569 int5 no TTT): 1.1317 — int5 penalty too high on d=512
Approach B (openai#576 d=576 int5 + legal s_0 TTT): 1.1188 — best legal result
Approach C (GEPA int5 + TTT): artifact over 16MB

Key lesson: TTT re-scoring is illegal (PR openai#991 closed for this).
Only s_0 cumulative first-pass score is legal.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant